Tree-based Ensemble Classifiers for High-dimensional Data

نویسندگان

  • James J. Chen
  • Hojin Moon
  • Songjoon Baek
  • Mark C. K. Yang
  • Anne Chao
  • Y. C. Chen
  • JAMES J. CHEN
  • HOJIN MOON
  • SONGJOON BAEK
چکیده

Building a classification model from thousands of available predictor variables with a relatively small sample size presents challenges for most traditional classification algorithms. When the number of samples is much smaller than the number of predictors, there can be a multiplicity of good classification models. An ensemble classifier combines multiple single classifiers to improve classification accuracy. This paper overviews tree-based classifiers and compares the performance of the three ensemble classifiers: random forest (RF), classification by ensembles from random partitions (CERP), and adaptive boosting (AdaBoost), and three single tree algorithms are also evaluated, classification tree (CTree), classification rule with unbiased interaction selection and estimation (CRUISE), and quick, unbiased and efficient statistical tree (QUEST). The six tree-based classifiers are applied to five high-dimensional datasets. In all datasets, the three ensemble classifiers show much higher classification accuracies than the three single tree algorithms, with the exception of the AdaBoost ensemble classifier in one dataset. RF and CERP are comparable in terms of accuracy. The RF and CERP bagging classifiers show higher accuracies than the AdaBoost boosting classifier. For the three tree classifiers, QUEST generally shows higher accuracy than CTree and CRUISE.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel hybrid method for vocal fold pathology diagnosis based on russian language

In this paper, first, an initial feature vector for vocal fold pathology diagnosis is proposed. Then, for optimizing the initial feature vector, a genetic algorithm is proposed. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different classifiers (ensemble of decision tree, discriminant analysis and K-nearest neig...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007